\(^1\) GenomEast platform, IGBMC

1 - Training set up

1.1 - Datasets analyzed during this training

For this training, we will use two datasets:

  • datasets produced by Achour et al Pubmed. In this project they analyzed transcriptomics (RNA-seq) and epigenomics (ChIP-seq) data in the striatum of Huntington’s disease mice. We will focus on the RNA-seq data.

The data are publicly available in GEO under the accession number GSE59572. It contains two subseries:

1.2 - Tools used during this training

1.2.1 - Galaxy

Bioinformatics tools will be run through the french instance of Galaxy, Galaxy France in order to analyzed the data.

1.2.2 - IGV

The Genome browser IGV will be used to visualize the data in a genomics context.

1.2.3 Biojupies

Biojupies will be used to run the differential expression analysis.

2 - Prepare your Galaxy environment

Galaxy is a tool that allow users to run bioinformatics tools on a high performance computing cluster through a simple web interface. We are going to use the french instance of Galaxy, Galaxy France.

2.1 - Log in to Galaxy

Go to Galaxy France website: https://usegalaxy.fr/ and log in with your personal account.

2.2 - Import a public history that contains data analyzed during this training

Data analyzed during this training are available in a public history: https://usegalaxy.fr/u/stephanie/h/neuro-epigenetics-training-data. Import this history.

2.2.1 - Browse to the history named “Neuro-epigenetics training”

2.2.2 - Import the history

2.2.3 - Create a new working history

2.2.4 - Name the new history “Neuro-epigenetics training”

2.2.5 - Import raw data (fastq files) from the imported history to the newly created history “Neuro-epigenetics training”

The datasets are in the imported history “Imported: Neuro-epigenetics training (data)”.

  • Click on the down sided arrow on the top right of your history panel and select “Show History Side-by-Side”

  • Drag and drop the datasets R6_1_387_St.chr19.fastq.gz and WT_320_St.chr19.fastq.gz from imported history to the working one.

3 - Analysis of RNA-seq data

Analysis of RNA-seq data will be run with the following steps:

  • [Galaxy] Quality controls
  • [Galaxy] Mapping
  • [Galaxy] Generation of visualization tracks
  • IGV Visualization of the data
  • [not done] Generation of per gene counts matrix
  • [Galaxy] Differential expression analysis

3.1 - Quality controls

Tool: FastQC

Website: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/

Citation: Andrews, S. (2010). FastQC: A Quality Control Tool for High Throughput Sequence Data [Online]. Available online at: http://www.bioinformatics.babraham.ac.uk/projects/fastqc/

Use of the tool: It is used to assess the quality of high throughput sequencing data. The tool takes raw sequencing data (fastq files) or mapping results (BAM, SAM files) and generates a HTML report that gives a quick impression with summary graphs of the quality of the data.

3.1.1 - Search for the term “fastqc” in the top left search field and click on the tool name “FastQC”

3.1.2 - Run Fastqc on the WT sample (WT_320_St.chr19.fastq.gz)

3.1.3 - Do the same on the R6/1 sample

3.2 - Mapping

Tool: STAR

Documentation: https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf

Citation: Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013 Jan 1;29(1):15-21. doi: 10.1093/bioinformatics/bts635. Epub 2012 Oct 25. PMID: 23104886; PMCID: PMC3530905.

Use of the tool: It map RNA-seq data to the reference genome really fast. It uses known transcript junction information to align reads but can also discover new splice junction events.

3.2.1 Add the dataset that describe transcripts structure (Mus_musculus.NCBIM37.67_UCSConlychr.gtf) to your current history

  • The dataset is in the imported history “Neuro-epigenetics training (data)”.

  • Click on the down sided arrow on the top right of your history panel and select “Show History Side-by-Side”

  • Drag and drop the dataset Mus_musculus.NCBIM37.67_UCSConlychr.gtf from imported history to working one.

3.2.2 Run STAR to map the reads to the genome

3.2.3 Import the two datasets WT_320_St.chr19.bam and R6_1_387_St.chr19.bam to your working history.

As mapping is a long processing step, mapping data are provided in the imported history “Neuro-epigenetics training (data)”.

  • Click on “Show history options” > “Show History Side-by-Side”
  • Drag and drop the two datasets from the imported history to the working one.

3.3 Generation of visualization tracks

Tool: Deeptools bamCoverage

Documentation: https://deeptools.readthedocs.io/en/develop/content/tools/bamCoverage.html

Citation: Ramírez, Fidel, Devon P. Ryan, Björn Grüning, Vivek Bhardwaj, Fabian Kilpert, Andreas S. Richter, Steffen Heyne, Friederike Dündar, and Thomas Manke. deepTools2: A next Generation Web Server for Deep-Sequencing Data Analysis. Nucleic Acids Research (2016). doi:10.1093/nar/gkw257.

Use of the tool: This is suite of tools implemeted to manage next generation sequencing data especially ChIP-seq and RNA-seq data. Some tools can create plots useful to have global views at the data.

3.3.1 Use the tool bamCoverage to generate comparable signal tracks from mapping data

3.4 Visualization of the data

3.4.1 Download mapping files

Do it for the two files WT_320_St.chr19.bam and R6_1_387_St.chr19.bam

3.4.2 Download results files (results from bamCoverage)

Do it for the two result datasets.

3.4.3 Launch IGV, select the assembly: mm9

3.4.4 Load the bam files and the bigwig files

Note: file bam.bai should be in the same directory as bam files otherwise they won’t be loaded!

In IGV menu:

Select bam files and bigwig files.

You should get:

3.4.5 Go to chromosome 19

3.4.6 Select the two bigwig tracks

3.4.6.1 Set them to the same scale using + select Group Autoscale

3.4.6.2 Set the windowing function to Maximum + select Maximum

3.4.7 Go to Syt12 gene

3.5 Generation of per gene counts matrix

3.5.1 The matrix of read counts per gene is available in GEO website

It has been downloaded from GEO. It is available in the file data/GSE59571_S13113_readCounts.xlsx. We are going to run a differential expression analysis on these data.

3.6 Differential expression analysis

3.6.1 Use the matrix in the tool Biojupies to run a differential expression analysis

Tool: Biojupies

Website: https://maayanlab.cloud/biojupies/

Documentation: https://maayanlab.cloud/biojupies/help

Citation: Torre D, Lachmann A, Ma’ayan A. BioJupies: Automated Generation of Interactive Notebooks for RNA-Seq Data Analysis in the Cloud. Cell Syst. 2018 Nov 28;7(5):556-561.e3. doi: 10.1016/j.cels.2018.10.007. Epub 2018 Nov 14. PMID: 30447998; PMCID: PMC6265050.

Use of the tool: BioJupies is a web application that enables the RNA-seq data analyses. Through an intuitive interface, users can rapidly generate tailored reports to analyze and visualize their own raw sequencing files, gene expression tables, or fetch data from >9,000 published studies containing >300,000 preprocessed RNA-seq samples.

3.6.2 Start an analysis with Biojupies

The created notebook is available here: https://maayanlab.cloud/biojupies/notebook/3bDxb3Opy or click to run the analysis report.

3.6.3 Download the list of deregulated genes, is Syt12 significantly deregulated?